Introduction

Polly Notebook is a scalable analytics platform which allows you to perform data analysis remotely in a Jupyter-like notebook. It provides the flexibility to select the compute capacity, the environment according to your need along with the ability to share the analyses with your peers for seamless team collaboration.

Polly Notebook provides a Jupyter-like interface on the cloud. Some of the features of Polly Notebooks over other local hosting options are:

Ready-to-code platform: Installing and maintaining environments for every notebook can be a frustrating overhead. We provide custom docker environments that come pre-installed with modules commonly used in bioinformatics. You can also add your own custom docker environments.
Cloud storage: With Polly Notebooks, you can store your data files and notebooks in a single place that will be ready to run in less than 5 minutes from anywhere in the world. No need to fetch your code from Bitbucket anymore!
Share and collaborate on your projects: Polly allows sharing of projects so you can review and refer notebooks within your team.
Resource management: Most biological analyses (like RNAseq) are commonly resource-intensive, whether in terms of RAM or processing power. In such cases, you either have to scramble for bigger resources or compromise on the speed by using less processing power. Polly makes it possible to scale up your resources at any time.

Accessing Polly Notebooks

Navigate to the Polly Projects in which the analysis needs to be performed.

Polly Projects

Figure 1. Polly Projects

Further, navigate to the notebook tab to end up to the Polly Notebook page.

Notebook Tab

Figure 2. Notebook Tab

You can access Polly Notebooks in three ways:

Create a new notebook: Click on the New Notebook button located on the right side of the search bar to create a new notebook.

New Notebook button to create a new notebook

Figure 3. New Notebook button to create a new notebook

Upload a notebook: Click on the Upload Notebook button on the right side of the search bar.

Upload Notebook button to upload a notebook

Figure 4. Upload Notebook button to upload a notebook

The notebook can be uploaded from the local system as well as from the various cloud storage services (Dropbox, Google Drive and Box). To upload from local system, files can be dragged and dropped. To upload from various cloud storage services, select the relevant option, login to the service and select the files to be uploaded.

Window to import notebook from local or other cloud storage services

Figure 5. Window to import notebook from local or other cloud storage services

Open an existing notebook: Click on the name of any existing notebook to open it.

Opening a notebook

Figure 6. Opening a notebook

Pre-Configured Environments

Polly supports various notebook environments in the form of dockers to cater to the needs of different users. Each of the dockers is built according to various data analytic needs ranging from basic scripting, processing large data or training and testing of ML models. The menu to select the notebook environments will pop-up whenever you create or upload the notebook and opens it for the first time.

Menu to select various available environments

Figure 7. Menu to select various available environments

The various notebook environments supported are as follows:

Environment	Usage	R libraries	Python Modules	System
R	General R scripting	askpass 1.1 assertthat 0.2.1 backports 1.1.5 base64enc 0.1-3 BH 1.72.0-3 BiocManager 1.30.10 bitops 1.0-6 brew 1.0-6 callr 3.4.2 cli 2.0.2 clipr 0.7.0 clisymbols 1.2.0 colorspace 1.4-1 commonmark 1.7 covr 3.4.0 crayon 1.3.4 crosstalk 1.0.0 curl 4.3 cyclocomp 1.1.0 desc 1.2.0 devtools 2.2.2 diffobj 0.2.3 digest 0.6.25 DT 0.12 ellipsis 0.3.0 evaluate 0.14 fansi 0.4.1 farver 2.0.3 fastmap 1.0.1 foghorn 1.1.4 fs 1.3.1 gargle 0.4.0 ggplot2 3.2.1 gh 1.1.0 git2r 0.26.1 glue 1.3.1 gmailr 1.0.0 gridExtra 2.3 gtable 0.3.0 highlight 0.5.0 highr 0.8 htmltools 0.4.0 htmlwidgets 1.5.1 httpuv 1.5.2 httr 1.4.1 hunspell 3.0 ini 0.3.1 IRdisplay 0.7.0 IRkernel 1.1 jsonlite 1.6.1 knitr 1.28 labeling 0.3 later 1.0.0 lazyeval 0.2.2 leaflet 2.0.3 leaflet.providers 1.9.0 lifecycle 0.1.0 lintr 2.0.1 magrittr 1.5 markdown 1.1 memoise 1.1.0 mime 0.9 mockery 0.4.2 munsell 0.5.0 openssl 1.4.1 parsedate 1.2.0 pbdZMQ 0.3-3 pillar 1.4.3 pingr 2.0.0 pkgbuild 1.0.6 pkgconfig 2.0.3 pkgdown 1.4.1 pkgload 1.0.2 plyr 1.8.6 png 0.1-7 PollyConnector 0.0.0 praise 1.0.0 prettyunits 1.1.1 processx 3.4.2 promises 1.1.0 ps 1.3.2 purrr 0.3.3 R6 2.4.1 rappdirs 0.3.1 raster 3.0-12 rcmdcheck 1.3.3 RColorBrewer 1.1-2 Rcpp 1.0.3 rematch 1.0.1 rematch2 2.1.0 remotes 2.1.1 repr 1.1.0 reshape2 1.4.3 reticulate 1.14 rex 1.1.2 rhub 1.1.1 RJSONIO 1.3-1.4 rlang 0.4.5 rmarkdown 2.1 roxygen2 7.0.2 rprojroot 1.3-2 rstudioapi 0.11 rversions 2.0.1 rvest 0.3.5 scales 1.1.0 selectr 0.4-2 sessioninfo 1.1.1 shiny 1.4.0 sourcetools 0.1.7 sp 1.4-1 spelling 2.1 stringi 1.4.6 stringr 1.4.0 sys 3.3 testthat 2.3.2 tibble 2.1.3 tinytex 0.20 triebeard 0.3.0 urltools 1.7.3 usethis 1.5.1 utf8 1.1.4 uuid 0.1-4 vctrs 0.2.3 viridis 0.5.1 viridisLite 0.3.0 whisker 0.4 whoami 1.3.0 withr 2.1.2 xfun 0.12 xml2 1.2.2 xmlparsedata 1.0.3 xopen 1.0.0 xtable 1.8-4 yaml 2.2.1	None
Python 2	General Python 2 scripting	None	attrs 19.3.0 backports-abc 0.5 backports.functools-lru-cache 1.6.1 backports.shutil-get-terminal-size 1.0.0 bleach 3.1.0 certifi 2019.11.28 chardet 3.0.4 cmapPy 1.0.5 configparser 4.0.2 contextlib2 0.6.0.post1 cycler 0.10.0 decorator 4.4.1 defusedxml 0.6.0 entrypoints 0.3 enum34 1.1.6 funcsigs 1.0.2 functools32 3.2.3.post2 futures 3.3.0 h5py 2.10.0 idna 2.8 importlib-metadata 1.5.0 ipaddress 1.0.23 ipykernel 4.10.1 ipython 5.9.0 ipython-genutils 0.2.0 ipywidgets 7.4.2 Jinja2 2.11.1 jsonschema 3.2.0 jupyter-client 5.3.4 jupyter-core 4.6.2 kiwisolver 1.1.0 MarkupSafe 1.1.1 matplotlib 2.2.4 mistune 0.8.4 nbconvert 5.6.1 nbformat 4.4.0 notebook 5.7.8 numpy 1.16.6 pandas 0.24.1 pandocfilters 1.4.2 pathlib2 2.3.5 pexpect 4.8.0 pickleshare 0.7.5 plotly 3.7.0 prometheus-client 0.7.1 prompt-toolkit 1.0.18 ptyprocess 0.6.0 PubChemPy 1.0.4 Pygments 2.5.2 pyparsing 2.4.6 pyrsistent 0.15.7 python-dateutil 2.8.1 pytz 2019.3 pyzmq 18.1.1 qgrid 1.1.1 requests 2.21.0 retrying 1.3.3 scandir 1.10.0 scikit-learn 0.20.3 scipy 1.2.3 Send2Trash 1.5.0 simplegeneric 0.8.1 singledispatch 3.4.0.3 six 1.14.0 subprocess32 3.5.4 terminado 0.8.3 testpath 0.4.4 tornado 5.1.1 traitlets 4.3.3 urllib3 1.24.3 wcwidth 0.1.8 webencodings 0.5.1 widgetsnbextension 3.4.2 zipp 1.1.0
Python 3	General Python 3 scripting	None	alembic 1.4.1 async-generator 1.10 attrs 19.3.0 awscli 1.17.12 backcall 0.1.0 bleach 3.1.1 botocore 1.14.12 certifi 2019.11.28 chardet 3.0.4 colorama 0.4.3 cycler 0.10.0 decorator 4.4.2 defusedxml 0.6.0 docutils 0.15.2 entrypoints 0.3 idna 2.8 importlib-metadata 1.5.0 ipykernel 5.1.4 ipython 7.13.0 ipython-genutils 0.2.0 ipywidgets 7.5.1 jedi 0.16.0 Jinja2 2.11.1 jmespath 0.9.5 jsonschema 3.2.0 jupyter-client 6.0.0 jupyter-core 4.6.3 jupyter-dashboards 0.7.0 jupyterhub 0.9.4 kiwisolver 1.1.0 Mako 1.1.2 MarkupSafe 1.1.1 matplotlib 2.2.3 mistune 0.8.4 nbconvert 5.6.1 nbformat 5.0.4 notebook 5.7.2 numpy 1.18.1 pamela 1.0.0 pandas 1.0.1 pandocfilters 1.4.2 parso 0.6.2 pexpect 4.8.0 pickleshare 0.7.5 prometheus-client 0.7.1 prompt-toolkit 3.0.3 ptyprocess 0.6.0 pyasn1 0.4.8 Pygments 2.5.2 pyparsing 2.4.6 pyrsistent 0.15.7 python-dateutil 2.8.1 python-editor 1.0.4 python-oauth2 1.1.1 pytz 2019.3 PyYAML 5.3 pyzmq 19.0.0 qgrid 1.3.0 requests 2.21.0 rsa 3.4.2 s3transfer 0.3.3 Send2Trash 1.5.0 six 1.14.0 SQLAlchemy 1.3.13 terminado 0.8.3 testpath 0.4.4 tornado 5.1.1 traitlets 4.3.3 urllib3 1.24.3 wcwidth 0.1.8 webencodings 0.5.1 widgetsnbextension 3.5.1 zipp 3.1.0
Pollyglot	Multiple kernels (R, python and bash) in same notebook/environment	All libraries from base R docker Seurat pagoda2 CellRanger SingleR	All libraries from base python docker scanPy velocyto scVI(scVI supports pytorch) louvain
Barcoded Bulk RNA-seq	Alignment and processing of RNA-seq fastq files with barcodes	All libraries from R docker limma affy DESeq2 edgeR cqn sva BioMart mygene amritr Boruta fgsea gsva ReactomePA xCell singleR enrichR org.Hs.eg.db org.Mm.eg.db Annotation dbi clusterProfiler		STARsubread-1.6.4-source gosaamer Fastqc Multiqc Picard
Machine Learning in python	Training, testing and validation of ML models	None	All libraries from base python docker h5py keras lightgbm tensorflow xgboost
Single Cell Downstream	Single Cell Analysis	All libraries from base R docker Seurat pagoda2 CellRanger SingleR	All libraries from base python docker scanPy velocyto scVI(scVI supports pytorch) louvain
Data Exploration	R and python for general data analysis	All libraries from base R docker	All libraries from base python docker
RNA-seq Downstream	Transcriptomics Analysis	All libraries from R docker limma affy DESeq2 edgeR cqn sva BioMart mygene amritr Boruta fgsea gsva ReactomePA xCell singleR enrichR org.Hs.eg.db org.Mm.eg.db Annotation dbi clusterProfiler	All libraries from base python docker

Computational Machines Available

The size of the data varies from few MBs to hundreds of GBs, and in order to process and analyze this huge data, one would need the computation power from a small machine to a large workstation. Polly Notebook supports configurations having 2 to 72 GB Ram and 1 to 36 CPU cores. The menu to select a machine configuration will pop-up when you creates a new notebook or uploads a notebook and tries to open it for the first time.

Menu to select various machine configurations

Figure 8. Menu to select various machine configurations

Most of the machine configuration are already specified to cover the wide variety of use cases. More machine configuration can also be made available on request (contact us at polly@elucidata.io). The general machine configurations are divided into three broad categories:

General purpose: Configurations from 1 to 4 CPU cores and 2 to 16 GB RAM fall under this category. The various configurations are:

Name	CPU/Cores	RAM
Polly small	1	2 GB
Polly medium	2	4 GB
Polly large	2	8 GB
Polly x-large	4	16 GB

Compute Intensive: Configurations from 16 to 36 CPU cores and 32 to 72 GB RAM fall under this category. The various configurations are:

Name	CPU/Cores	RAM
Polly 2x-large	16	32 GB
Polly 3x-large	36	72 GB

Memory-Optimized: Configurations from 4 to 8 CPU cores and 32 to 64 GB RAM fall under this category. The various configurations are:

Name	CPU/Cores	RAM
Polly 2x-large	4	32 GB
Polly 3x-large	8	64 GB

Other Useful Features

There are few other useful features as well that might come handy when using a Polly Notebook. Cloning or deleting a notebook and changing the machine configuration of an existing notebook are a few of them. The detailed stepwise flow of how to carry out these changes is as follows:

Cloning a notebook: To clone an existing notebook, the steps to be followed are:
- Click on the (three dots) menu at the end of the notebook that you want to clone. A menu with various options will open.
Figure 9. Menu to select the Clone option
- Scroll down in the menu and navigate to the Clone option. Click on the Clone option to create a copy of the notebook.
Figure 10. Clone option in the menu
- A duplicate/copy of the notebook will be created.
Figure 11. Cloned notebook
Deleting a notebook: To delete an existing notebook, the steps to be followed are as follows:
- Click on the (three dots) menu at the end of the notebook that you want to clone. A menu with various options will open.
Figure 12. Menu to delete the notebook
- Scroll down in the menu and navigate to the Delete option. Click on the Delete option to create a copy of the notebook.
Figure 13. Delete option to delete a notebook
- The selected notebook will be deleted.
Changing machine configuration: Polly gives the flexibility to change the machine configuration to allow the usage of the notebook according to the computing power required at each step. You can change the configuration according to the need at each step. The steps to be followed are:
- Click on the Edit button located at the end of each notebook entry, right next to the (three dots) menu.
Figure 14. Edit button to change machine configuration
- A menu with the different machine configuration will open, with the various options available will be displayed under the Machine type to run segment. Select the appropriate option to change the configuration.
Figure 15. Machine configurations options

Getting started with Polly Notebook

Upon selecting a pre-configured docker environment and a computational machine, a Polly Notebook starts launching on a new tab of the browser. Based upon the type of computational machine chosen while launching a Polly Notebook you will see a progress bar which will tell you that your new notebook is opening.

Progress bar upon launching a Polly Notebook

Figure 16. Progress bar upon launching a Polly Notebook

Once the server is ready, you will see the new notebook gets opened on the browser. The interface is very similar to that of a Jupyter notebook.

Polly Notebook interface

Figure 17. Polly Notebook interface

On the top left, you can see a pre-defined name given to the notebook if in case a new notebook was created. Towards the top right, you can see the Polly Project name and below it, you can see the kernel/docker environment selected for opening the notebook.

Menu bar: There are multiple tabs present in the menu bar section which can be used to operate various functions in the notebook. For example, under the File tab, you can select the Rename option to change the name of the current active notebook.
Toolbar: It contains multiple icons that allow you to perform various operations that are frequently used.

Structure of Polly Notebook

The Polly notebook comprises of a sequence of cells. There are three types of cells: markdown cells, raw cells, and code cells. In each of these types, you can input multi-line content and each cell can be executed by pressing Shift+Enter, or by clicking either the Run cells option on Cell tab in the menu bar or the “Play” button in the toolbar.

Structure of a Polly Notebook

Figure 18. Structure of a Polly Notebook

Markdown cells

You can record the computational process in a proficient manner using rich text. The Markdown language allows you to define a structure to the notebook by using markdown headings. It gives a basic method to play out text markup, that is, to determine which parts of the text should be stressed (italics), bold, form lists, etc.

Raw cells

You can write output directly in the raw cells. A raw cell is not evaluated by a notebook meaning anything written in the raw cell goes to the output when that cell is executed.

Code cells

A code cell allows you to edit and write a new code. The code cell executes the code written by you based on the kernel selected while launching the notebook. The code cell can include multiple programming languages as well as seen on the bottom right side of the image above. The above example is of a Pollyglot Docker environment which allows you to select multiple programming languages in the same notebook thus, you can select the type of kernel you prefer to code on.

Once the code cell is executed, the results which are computed by sending the code to the kernel are displayed as an output below the cell. Again to execute a code cell, you can click on the “Run” button and if you want to stop the computation process of a particular code cell, then the “Interrupt” button needs to be selected in the toolbar.

Running a code cell

Figure 19. Running a code cell

Polly Offerings

Polly Offerings tab in the Menu bar contains the following two options, namely Terminal and File Explorer which are described below.

Polly Offerings tab

Figure 20. Polly Offerings tab

Terminal

Once the Terminal option is selected, it launches a new tab on the browser and provides access to the command-line interface to execute any sets of commands. You have access to all the file types which are available in the docker environment and those can be managed through the terminal as well. The terminal option also allows you to install Python or R packages (as described later), managing system binaries and system configurations, and helps you working with code repositories hosted on GitHub, Bitbucket, etc.

Terminal screen window

Figure 21. Terminal screen window

File Explorer

Similar to the above option, if you select the File Explorer option, a new tab opens up in the browser and you can view different file types and directories present in the docker environment. Under the Files tab, the list of all the files and directories is available to you and any modification such as delete, upload or modifying by opening a file type can be done.

Demo Data for FirstView

Figure 22. File Explorer window

Additionally, you can also launch a new notebook by selecting the New button present on the top right corner of the page in File Explorer. The new notebook will open in a new tab and would automatically be made available in the Notebook section of the same Polly Project of the original notebook.

Launching a new notebook using File Explorer

Figure 23. Launching a new notebook using File Explorer

File Explorer window also allows you to view, edit or create various file types in an interactive manner. The Text File option in the New button can be used to create a new text file. For viewing or editing a file, you can click on the file and a text editor will open in a new tab of the browser. You can view or edit the file and save the changes made in the file. The text editor also allows you to select a programming language from the Language tab to edit and convert the file format.

Opening a file using a Text editor

Figure 24. Opening a file using a Text editor

Accessing Project files in Notebook

Accessing individual files

For carrying on analysis, if you require any input files which are available in Polly Project, those files can be fetched using a set of commands. You can list all the files present in the Project and then select the individual file by the following command

## Lists all the files present in the project
list_project_file()
## The file will be downloaded in the current working directory
download_project_file('sample_file.csv')

After finishing the analysis, you can push back the newly generated output files again to the Project using the following command

## Save the file to the project
save_file_to_project('sample_file.csv')

Accessing individual files in a notebook

Figure 25. Accessing individual files in a notebook

Accessing directories

Similar to individual files, you can fetch directories from the Project.

The contents of any directory within a Project can be listed using the following command on a notebook terminal.

polly files list --workspace-path "" -y

Here the path of the directory has to start with “polly://”. To view the contents within a folder called “Data” in the project, the following command will have to be executed on the notebook terminal.

polly files list --workspace-path "polly://Data" -y

To access the directory in the notebook, the following command will have to be executed on the notebook terminal.

polly files sync -s "" -d "" -y

If the folder called “Data” is to be accessed from Project in the notebook folder called “Input”, execute the following command.

polly files sync -s "polly://Data" -d "Input" -y

To save notebook directories back to Project, keep the source as notebook directory and destination as Polly Project in the same command as mentioned above.

polly files sync -s "" -d "" -y

To save the folder called “Output” back to Polly Project, use the following command.

polly files sync -s "Output" -d "polly://" -y

Installing Packages

Although most of the required packages and tools can be made available to you via the customized docker environment, sometimes you might require to install new packages to carry on the analysis. For installing the packages, you can choose two options based on their convenience, you can do it on the Notebook itself or via the terminal.

Installing packages and system binaries using the Notebook cell

You can install the required packages and system binaries by running the usual installation codes on the code cell of a notebook.

For Python packages: You can run the following command in the code cell with Python kernel selected to install the required packages.

# for installing packages DON'T forget to use sudo. It will not ask for password.
!sudo pip install

For R packages: You can run the following command in the code cell with R kernel selected to install the required packages.

# for installing packages DON'T forget to use sudo. It will not ask for password.
## Installing CRAN packages
!sudo  R -e 'install.packages(c("package-name"), repos="https://cloud.r-project.org/")'

## Installing Bioconductor packages
!sudo R -e 'BiocManager::install(c("package-name"), update = TRUE, ask = FALSE)'
# If error finding BiocManager then install it first using the following command and re-run the above command.
!sudo  R -e 'install.packages(c("BiocManager"), repos="https://cloud.r-project.org/")'

Installing R and Python packages

Figure 26. Installing R and Python packages

For System binaries: You can also install the system binaries by running the following command in the code cell selecting the bash kernel.

# System binaries
sudo apt install 

# If the above command outputs package not found, You can run this command to update the system package indices
sudo apt-get update

Installing System binaries using the Notebook code cell

Figure 27. Installing System binaries using the Notebook code cell

Installing packages and system binaries via Terminal

Another option is also available to install various packages and system binaries using the terminal. You can access the terminal as described in the document above. The commands for installation are almost similar to commands used while installing using a notebook code cell.

For Python packages: You can run the following command directly on the terminal to install the required packages. Once the package installation is successful, you can import the package in your notebook.

# for installing packages DON'T forget to use sudo. It will not ask for password.
> sudo pip install

Installing Python packages using the Terminal

Figure 28. Installing Python packages using the Terminal

For R packages: You are required to go to the terminal and open the R Kernel using “sudo R” and then install the required R packages. Once the package installation is successful, you can import the library in your notebook R kernel as usual.

## You can install R package by opening R terminal
> sudo R

## Install CRAN packages using the following command
> install.packages(c('pkg-name'), dependencies=TRUE, repos=)
# For cran mirror link: You can use either of your choice or this one : "https://cran.cnr.berkeley.edu/"

## Install Bioconductor packages using the following command
> BiocManager::install(c("pkg-name"), update = TRUE, ask = FALSE)
# If error finding BiocManager then install it first using the following command and re-run the above command.
> install.packages("BiocManager")

Installing R packages using the Terminal

Figure 29. Installing R packages using the Terminal

For System binaries: You can also install the system binaries by running the following command directly on the terminal itself.

# System binaries
> sudo apt install 

# If the above command outputs package not found, You can run this command to update the system package indices
> sudo apt-get update

Installing System libraries using the Terminal

Figure 30. Installing System libraries using the Terminal

Reusable Scripts

Polly Notebook also allows you to make use of the reusable scripts which are already made available to you in every notebook. The reusable scripts consist of the snippet codes which are required frequently to perform any analysis. The scripts can include data reading, normalization, visualization generic functions/codes and can be added to the notebook code cell with just a single click and executed as usual. The reusable scripts can be found on the left side as a collapsible dialogue box and you can choose the scripts at any time while performing the analysis.

Reusable scripts on Polly Notebook

Figure 31. Reusable scripts on Polly Notebook

On the right side, another collapsible dialogue box gets opened when you select any reusable script which provides information about the options and usage of that particular reusable script. You can also add your own reusable scripts on the Polly Notebook so as to make use of them in your repeated analysis and save time.

Options and Information of Reusable scripts

Figure 32. Options and Information of Reusable scripts